Multi-Conditional Constraint Generative Adversarial Network-Based MR Imaging from CT Scan Data

Magnetic resonance (MR) imaging is an important computer-aided diagnosis technique with rich pathological information. The factor of physical and physiological constraint seriously affects the applicability of that technique. Thus, computed tomography (CT)-based radiotherapy is more popular on account of its imaging rapidity and environmental simplicity. Therefore, it is of great theoretical and practical significance to design a method that can construct an MR image from the corresponding CT image. In this paper, we treat MR imaging as a machine vision problem and propose a multi-conditional constraint generative adversarial network (GAN) for MR imaging from CT scan data. Considering reversibility of GAN, both generator and reverse generator are designed for MR and CT imaging, respectively, which can constrain each other and improve consistency between features of CT and MR images. In addition, we innovatively treat the real and generated MR image discrimination as object re-identification; cosine error fusing with original GAN loss is designed to enhance verisimilitude and textural features of the MR image. The experimental results with the challenging public CT-MR image dataset show distinct performance improvement over other GANs utilized in medical imaging and demonstrate the effect of our method for medical image modal transformation.


Introduction
Magnetic resonance (MR) imaging [1] and computed tomography (CT) [2], both of which are suitable for the inspection of lesions in various tissues throughout the body, are commonly used as computer-aided medical imaging diagnostic techniques. Due to non-invasive, non-radiation, multi-contrast, and the fact that it contains more pathological information, MR imaging is more often referenced for the diagnosis of most diseases compared with CT imaging [3], especially on soft tissues, ligaments, and organs. However, since MR imaging should be finished in an airtight space with a strong magnetic field in about half an hour, the procedure for MR imaging is contraindicated for some patients with claustrophobia, cardiac pacemakers, and artificial joints [4]. By contrast, CT scanning does not need to be carried out in an airtight environment. It can also be finished within a few minutes, which is easier for patients to endure. Therefore, it is significant and valuable to develop a method that can estimate an MR image from its corresponding CT scan data. Both MR images and CT images, showing the anatomy and pathology of each tissue structure with a single channel, are digital images. However, they belong to different modalities due to different imaging principles [5]. Due to this, we can treat MR imaging from CT scan data as a medical image modal transformation problem, which is one of the fundamental topics in computer vision.
1. We treat MR image synthesis as an object re-identification problem and introduce cosine loss, which combines with voxel error and perception error as the model function. 2. We modify the generator based on cycleGAN and design the discriminator based on PatchGAN under the constraint of the paired CT-MR dataset.

Multi-Channel GAN [22]
Deep MR-to-CT [ We design the function and modify the GAN architecture to optimize quality of the generated MR image. The remainder of this paper is organized into four sections. We introduce the working principle of GAN and conditional GAN in Section 2; Section 3 introduces our proposed method in detail, including multiple conditional constraint-based GAN structure and objective function design. Section 4 demonstrates the experimental results and contains a discussion on the specific comparison analysis; Section 5 gives the conclusion of this paper.

The Working Principle of GAN and cGAN
Standard GAN is introduced as a typical unsupervised learning method to train a generative model. As in Figure 2a, the framework of GAN contains a pair of competing models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample comes from the training data rather than G. To learn a generator distribution over data , the generative model builds a mapping function from a prior noise distribution to data space as ( ) . The discriminative model outputs a single scalar representing the probability that comes from training data rather than . G and D are trained simultaneously to adjust parameters for G and D to minimize 1 log ( ( )) D G z − and log ( ) D x , respectively, as if they are following the two-player min-max game with value function ( , ) L D G :  We design the function and modify the GAN architecture to optimize quality of the generated MR image. The remainder of this paper is organized into four sections. We introduce the working principle of GAN and conditional GAN in Section 2; Section 3 introduces our proposed method in detail, including multiple conditional constraint-based GAN structure and objective function design. Section 4 demonstrates the experimental results and contains a discussion on the specific comparison analysis; Section 5 gives the conclusion of this paper.

The Working Principle of GAN and cGAN
Standard GAN is introduced as a typical unsupervised learning method to train a generative model. As in Figure 2a, the framework of GAN contains a pair of competing models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample comes from the training data rather than G. To learn a generator distribution p g over data x, the generative model builds a mapping function from a prior noise distribution p z to data space as G(z). The discriminative model outputs a single scalar representing the probability that x comes from training data rather than p g . G and D are trained simultaneously to adjust parameters for G and D to minimize 1 − log D(G(z)) and log D(x), respectively, as if they are following the two-player min-max game with value function L(D, G):  Standard GAN is an unsupervised learning model which cannot control the category of the generated image. However, it can be extended to a conditional model (cGAN) [32] if both the generative model and discriminative model are conditioned on some extra constrained information as shown in Figure 2b. , which is any kind of auxiliary information including data from other modalities, can be fed into both the generative model and discriminative model as an additional input layer to perform the conditioning. In the generative model, the prior noise and are combined in joint hidden representation, and the adversarial training framework allows for considerable flexibility in how this hidden representation is composed. In the discriminative model, and are presented as inputs to a discriminative function. The objective function is also the two-player min-max game as in Equation (2): In cGAN, the input of the generative model can also be an arbitrary image [33,34] besides [32]. In this paper, we use the CT image as input and its corresponding real MR image as constraint to design a special cGAN structure, so as to realize the modal transformation from CT image to corresponding MR.

Multi-Condition Constraint GAN Model
The modal transformation process of the image is reversible, which means CT and MR images can be converted to each other based on different generators with the corresponding constraint. Inspired by CycleGAN, this paper constructs a multi-conditional constraint GAN model that includes a generator, an inverse generator, and two discriminators. The proposed model optimizes the training model with the minimum error between the real and the generated MR image, which can improve the fidelity and detailed characteristics of the generated MR image. Since both the real and generated MR image belong to the same modality, we innovatively treat the real and generated MR image discrimination as an object re-identification problem. Due to this, Standard GAN is an unsupervised learning model which cannot control the category of the generated image. However, it can be extended to a conditional model (cGAN) [32] if both the generative model and discriminative model are conditioned on some extra constrained information y as shown in Figure 2b.y, which is any kind of auxiliary information including data from other modalities, can be fed into both the generative model and discriminative model as an additional input layer to perform the conditioning. In the generative model, the prior noise p z and x are combined in joint hidden representation, and the adversarial training framework allows for considerable flexibility in how this hidden representation is composed. In the discriminative model, x and y are presented as inputs to a discriminative function. The objective function is also the two-player min-max game as in Equation (2): In cGAN, the input of the generative model can also be an arbitrary image [33,34] besides p z [32]. In this paper, we use the CT image as input and its corresponding real MR image as constraint to design a special cGAN structure, so as to realize the modal transformation from CT image to corresponding MR.

Multi-Condition Constraint GAN Model
The modal transformation process of the image is reversible, which means CT and MR images can be converted to each other based on different generators with the corresponding constraint. Inspired by CycleGAN, this paper constructs a multi-conditional constraint GAN model that includes a generator, an inverse generator, and two discriminators. The proposed model optimizes the training model with the minimum error between the real and the generated MR image, which can improve the fidelity and detailed characteristics of the generated MR image. Since both the real and generated MR image belong to the same modality, we innovatively treat the real and generated MR image discrimination as an object re-identification problem. Due to this, cosine error fusing with original GAN loss is designed to enhance verisimilitude and textural features of the MR image.

Model Architecture
This paper builds a multi-conditional constraint GAN model to realize MR image generation with CT images as input. As shown in Figure 3, the model takes real CT images I CT as input and real MR images I MR as constraint to generate MR images G MR (I CT ) by generator G MR . On this basis, the G MR (I CT ) is used as input with I CT as constraint to generate CT images G CT (G MR (I CT )) by inverse generator G CT . The discriminators D MR and D CT should not only distinguish the authenticity of the input MR and the CT image, but also discriminate whether the image has a corresponding relationship with the input image.
cosine error fusing with original GAN loss is designed to enhance verisimilitude and textural features of the MR image.

Model Architecture
This paper builds a multi-conditional constraint GAN model to realize MR image generation with CT images as input. As shown in Figure 3  Both the generator and the inverse generator applied in our method include an encoder and a decoder where the encoder is used to extract features from the image, and the decoder is used to generate an image with the same scale as the input. This process is implemented separately through convolution and deconvolution operations. The generator and inverse generator in our proposed multi-conditional constraint GAN model are both improved based on the image transformation network [35], which produces impressive results in real-time style transfer and single-image super-resolution. The network contains two stride-one convolutions at the beginning and the end of the architecture, two stride-two convolutions, nine residual blocks [36], and two fractional convolutions with 0.5 stride. The nine residual modules are intended to deepen the network and expand the receptive field, so as to obtain more semantic information and extract detailed features of medical images. Each residual block includes two convolutions using 256 filters of 3 × 3 size with reflection padding, which effectively avoids the boundary artifacts and ensures the sharpness of the generated images. Instance normalization [37] and a rectified linear unit (ReLU) are followed to each convolution except the final one. The hyperbolic tangent (Tanh) [38] activation function follows the final convolution to guarantee the output is within [−1, 1]. The specific structure is shown in Table 2. Both the generator and the inverse generator applied in our method include an encoder and a decoder where the encoder is used to extract features from the image, and the decoder is used to generate an image with the same scale as the input. This process is implemented separately through convolution and deconvolution operations. The generator and inverse generator in our proposed multi-conditional constraint GAN model are both improved based on the image transformation network [35], which produces impressive results in real-time style transfer and single-image super-resolution. The network contains two stride-one convolutions at the beginning and the end of the architecture, two stride-two convolutions, nine residual blocks [36], and two fractional convolutions with 0.5 stride. The nine residual modules are intended to deepen the network and expand the receptive field, so as to obtain more semantic information and extract detailed features of medical images. Each residual block includes two convolutions using 256 filters of 3 × 3 size with reflection padding, which effectively avoids the boundary artifacts and ensures the sharpness of the generated images. Instance normalization [37] and a rectified linear unit (ReLU) are followed to each convolution except the final one. The hyperbolic tangent (Tanh) [38] activation function follows the final convolution to guarantee the output is within [−1, 1]. The specific structure is shown in Table 2. Table 2. Model architecture of generator G MR and inverse generator G CT layer name.

Output Size Filter Size/Stride Number of Conv. Layers
Input image The design of the discriminator is inspired by PatchGAN, proposed by Isola [39], which aims to classify small overlapping image patches rather than images. Compared with other discriminators, this patch-level discriminator has fewer parameters and can emphasize detailed information in local areas. The discriminator takes N × N fragments as input instead of the entire image so that it can pay more attention to the high-frequency information of the image, which in turn encourages the model to generate more realistic images. In the GAN model, the process of the generator to generate images is much more complicated than the process of the discriminator to discriminate the image authenticity. To balance the performance of the generator and the discriminator in the model, the discriminator designed in this paper belongs to a shallow network containing five convolutional layers. The specific structure is shown in Figure 4. In the second to fourth convolutional layers, normalization and non-linear processing based on leaky ReLU [40] are performed after each convolutional layer. The discriminator not only needs to distinguish the authenticity of the input MR image, but also needs to determine whether the MR image and the input CT image have a corresponding relationship.  The design of the discriminator is inspired by PatchGAN, proposed by Isola [39], which aims to classify small overlapping image patches rather than images. Compared with other discriminators, this patch-level discriminator has fewer parameters and can emphasize detailed information in local areas. The discriminator takes N × N fragments as input instead of the entire image so that it can pay more attention to the high-frequency information of the image, which in turn encourages the model to generate more realistic images. In the GAN model, the process of the generator to generate images is much more complicated than the process of the discriminator to discriminate the image authenticity. To balance the performance of the generator and the discriminator in the model, the discriminator designed in this paper belongs to a shallow network containing five convolutional layers. The specific structure is shown in Figure 4. In the second to fourth convolutional layers, normalization and non-linear processing based on leaky ReLU [40] are performed after each convolutional layer. The discriminator not only needs to distinguish the authenticity of the input MR image, but also needs to determine whether the MR image and the input CT image have a corresponding relationship.

Loss Function
Both networks in GANs are trained simultaneously. The discriminators D MR and D CT are used to estimate the probability of a sample coming from real data. The goal of the generator G MR is to generate an MR image using the CT image as input under the constraint of the real MR image. Whereas the G CT can be used to generate the CT image as a reversed constraint to impel the G MR to generate a more realistic MR image. Since the where I CT and I MR are the real CT and MR images; D MR (I CT , I MR ) and D CT (I CT , I MR ) represent the probability of the MR and CT images coming from real data. D MR (I CT , G MR (I CT )) and D CT (I CT , G CT (G MR (I CT ))) represent the probability of those coming from the generated one. Typically, adversarial loss can generate visually satisfactory MR images. However, the MR image modal transformation not only generates an MR image, but also renders it with richer pathological information corresponding to the content and features of the CT images. Therefore, the purpose of medical image modal transformation is to make the generated image contain more detailed texture features on the premise of similarity to the real image structure. For the paired data {I CT ,I MR }, the generator G MR is tasked with generating realistic MR images that are close to the reference I MR of the input I CT . Although the inverse generator G CT is not required as a final product, adding the same constraint to the G CT enables a higher quality of generated MR image. Considering the above factors comprehensively, the constraints on the similarity of image structure and detailed texture features reflected by the voxel loss and perception loss between the generated and the real CT image and the generated and the real MR image are defined as: where ϕ represents the VGG16 [41] model used to extract perception features, and ϕ j (G MR (I CT )) and ϕ j (G CT (G MR (I CT ))) represent the semantic features of the generated MR image and CT image, respectively. ϕ j (I CT ) and ϕ j (I MR ) represent the semantic features of the real CT image and MR image. H j , W j , C j represents the height, width, and depth of the feature map of the jth convolutional layer of VGG16. K is the number of convolutional network layers. In our work, the goal of the generator is to generate MR images using CT images as input under the constraint of the real MR image, which means the generated MR should be similar to the real MR image in structure and detailed texture features. Inspired by this, the discrimination between the real and the generated MR image can be viewed as object re-identification. Cosine loss function can push all samples away from the decision boundary towards their parametrized class mean direction. This means it can not only classify the object, but can also converge the features based on different classifiers, which is beneficial for internal-category classification.
Object re-identification of intra-class can be regarded as a classification problem among different individuals in the same object category. However, the traditional object classification is mostly inter-class classification, which usually maps objects randomly to the boundary. Inter-class classification does not require intra-class compactness and inter-class separation, which means it cannot achieve better object re-identification of intra-class. The von Mises-Fisher (vMF) distribution does better in exploring the intrinsic relationship between data posterior loss and prior distribution, which is suitable for object re-identification of intra-class as the potential distribution structure of data. In order to effectively discriminate the generated MR and real MR images, we propose to use vMF distribution and learn mapping to map input samples to the feature space R. In addition, we explore the relationship between data posterior loss and prior distribution based on Bayesian theory. The prior distribution of samples in the feature space R follows the vMF distribution: where κ is shared concentration parameter,ŵ T k f (x) represents the objective evaluation of samples, c is the number of category, andŵ T c f (x) represents the objective evaluation of different categories.
The vMF distribution is an isotropic probability distribution on a one-dimensional sphere that peaks around directionŵ k and decays as the cosine similarity decreases. The corresponding cosine loss function can be written as: where N is number of samples. This paper mainly involves real MR images and generated MR images, as well as real CT images and generated CT images within the class recognition. Therefore, the cosine loss function includes the cosine error between the real MR image and the generated MR image and the cosine error between the real CT image and the generated CT image: Combining the above optimizations, the objective function of the MR modal transformation based on the multi-conditional constraint GAN model with CT image as the input can be defined as: +λ voxel · L voxel (G MR (I CT ), G CT (G MR (I CT ))) +λ perc · L perc (G MR (I CT ), G CT (G MR (I CT ))) +λ cos · L cos (G MR (I CT ), G CT (G MR (I CT ))) (9) where λ voxel , λ perc , and λ cos are all hyperparameters, which are used to balance various objective functions. We aim to solve the: In summary, the MR modal transformation algorithm based on the multi-conditional constraint GAN model with CT images as input can be described in Algorithm 1.

Results and Analysis
The brain CT-MR dataset published by SZSPH is used to evaluate our proposed method, which includes 367 pairs of CT-MR data with corresponding relationships. We randomly select 257 pairs as training data, and the remaining 110 pairs as test data. All samples in the dataset are resized to 256 × 256. The experimental environment is Intel(R) Core(TM)i7-9700 CPU @ 3.20 GHz processor, 16 GB RAM, NVIDIA TITAN Xp. The multiconditional constraint GAN was trained with mini-batch stochastic gradient descent (SGD) with a mini-batch size of one. The number of iterations is set as 25,700 (100 epochs) during the model training; all the networks are trained at a learning rate of 0.0002 in the first 12,850 iterations and linearly decay to zero in the following 12,850 iterations. For all experiments, the following empirical values were used to train the model: λ voxel = 100, λ perc = 1, λ cos = 1.

Performance Evaluation Index
To evaluate the effect of the generated MR image, we use the mean absolute error (MAE) and root mean squared error (RMSE) of the voxels between the generated and the real MR image: where N is the number of image voxel values. In addition, peak signal-to-noise ratio (PSNR) is also one of the important indicators to evaluate the imaging quality of medical images: where mean square error (MSE) is the mean square error between the generated and the real MR image. MAX = 10, which is the maximum value of the pixel in the image.
where V 1 and V 2 are constants to prevent the divisor from being zero. µ I MR , µ G MR (I CT ) , σ I MR , and σ G MR (I CT ) are the average value and standard deviation of the generated and the real MR image, respectively. The smaller the values of MAE and RMSE, the larger the values of PSNR, SSIM, and PCC, and the higher the fidelity of the generated MR image.

Comparison of Results under Different Objective Optimization Functions
To verify the effectiveness of different items in the objective optimization function, we use the ablation method to evaluate them. Table 3 shows the comparison of four objective optimization function structures: (1) objective optimization using only the traditional GAN model function L GAN for model training; (2) on the basis of L GAN , the voxel loss L voxel item between the generated and the real MR image is added for model training; (3) on the basis of (2), the loss term L perc between the generated and the real MR image is added for model training; (4) on the basis of (3), the cosine loss term L cos between the generated and the real MR image is added for model training. Table 3. Ablation analysis of the objective function (↓ means that the smaller the value, the better, and ↑ means that the larger the value, the better). The best scores are displayed in bold. In Table 3, the values of MAE, RMSE, PSNR, SSIM, and PCC are the average measures over the test set. It can be found that the system performance has been optimized with different items joined. The integrated objective optimization function obtains the best performance. Therefore, the various loss functions proposed in this paper perform well. Perceptual error characterizes the semantic error between the generated and real MR image, which can cause the generated MR image to reflect more detailed texture features. The cosine loss treats the real and generated MR image discrimination as object re-identification, which can improve the authenticity of generated MR images.
Besides the quantitative analysis of the evaluation indicators above, we can obtain more image details by direct observation of the generated MR images. Figure 5 below shows the visualization results of the generated MR images under different objective functions. From the red rectangle in Figure 5, we can observe that there are differences in complex structure and detailed texture features between different images. Some generated MR images are incomplete in structure and have obvious structural elements missing from real MR images. Other generated MR images have low structural similarity, and many internal texture trends are not displayed. There are also some generated MR images that have insufficient advanced semantic extraction, in which detailed texture features are not obvious. In contrast, each loss item has a certain utility, and the objective function that integrates all the loss items can significantly improve the quality of the generated MR images. generated MR images are incomplete in structure and have obvious structural elements missing from real MR images. Other generated MR images have low structural similarity, and many internal texture trends are not displayed. There are also some generated MR images that have insufficient advanced semantic extraction, in which detailed texture features are not obvious. In contrast, each loss item has a certain utility, and the objective function that integrates all the loss items can significantly improve the quality of the generated MR images.

Comparison of Results under Different GAN Models
To prove the significance of the multi-conditional constraint GAN model proposed in this paper, we use the same dataset to train multi-channel GAN [22], DiscoGAN [24], Deep MR-to-CT [25], and MR-GAN [27], which are all applied to generate medical images. The average values of MAE, RMSE, PSNR, SSIM, and PCC among different algorithms are compared. As shown in Table 4, it can be found that the multi-conditional constrained GAN model proposed in this paper performs the best. The imaging rate is 70.51 ms, which is fast enough for generating the MR image. The model takes the generated MR image as input and the CT image is generated by the inverse generator when the reversibility of the generated image is fully considered. This can effectively improve the fidelity of the generated MR image.

Comparison of Results under Different GAN Models
To prove the significance of the multi-conditional constraint GAN model proposed in this paper, we use the same dataset to train multi-channel GAN [22], DiscoGAN [24], Deep MR-to-CT [25], and MR-GAN [27], which are all applied to generate medical images.
The average values of MAE, RMSE, PSNR, SSIM, and PCC among different algorithms are compared. As shown in Table 4, it can be found that the multi-conditional constrained GAN model proposed in this paper performs the best. The imaging rate is 70.51 ms, which is fast enough for generating the MR image. The model takes the generated MR image as input and the CT image is generated by the inverse generator when the reversibility of the generated image is fully considered. This can effectively improve the fidelity of the generated MR image. To intuitively reflect the performance indicators, we also describe the indicators in the form of box diagrams. As in Figure 6, the dot on the left of the box diagram represents the distribution of each generated MR image. The green triangle in the box represents the average value of all generated images in various evaluation indicators. The red dotted line connection can clearly compare each model under different evaluation indicators. Figure 7 indicates that the method proposed in this paper is superior to other models in performance. The circles next to the box plots represent a single image slice from the test dataset. The top and bottom box limits were calculated from Q25 and Q75. The green triangles and the horizontal lines denote the average and the median. The range of the box plot whiskers is given by [Q25 − 1.5 × (Q75 − Q25), Q75 + 1.5 × (Q75 − Q25)]. Any data point that falls outside of this range is typically considered an outlier and is indicated by a red cross. The averages displayed in Table 4 indicate that our proposed method outperformed the other methods for all measures, with the lowest MAE and RMSE and the highest PSNR, SSIM, and PCC, thus further verifying the utility of our architecture.
line connection can clearly compare each model under different evaluation indicators. Figure 7 indicates that the method proposed in this paper is superior to other models in performance. The circles next to the box plots represent a single image slice from the test dataset. The top and bottom box limits were calculated from 25 and 75. The green triangles and the horizontal lines denote the average and the median. The range of the box plot whiskers is given by [ 25 1.5 ( 75 25), 75 1.5 ( 75 . Any data point that falls outside of this range is typically considered an outlier and is indicated by a red cross. The averages displayed in Table 4 indicate that our proposed method  outperformed the other methods for   Moreover, the superiority of the algorithm proposed in this paper can also be verified through the human visual mechanism. Figure 7 shows the test results of different randomly selected GAN models from the test set. It can be seen that the generated MR image based on our method can better reflect the detailed texture information of the imaged object with less noise. This shows the multi-conditional constraint GAN model has stronger image construction capabilities, which can better reflect the continuity, Moreover, the superiority of the algorithm proposed in this paper can also be verified through the human visual mechanism. Figure 7 shows the test results of different randomly selected GAN models from the test set. It can be seen that the generated MR image based on our method can better reflect the detailed texture information of the imaged object with less noise. This shows the multi-conditional constraint GAN model has stronger image construction capabilities, which can better reflect the continuity, smoothness, and semantics of imaging features in image generation. As for the main reason: on the one hand, the reversibility of the cGAN model is fully considered in this paper. The CT image generated by the inverse generator reversely constrains the feature correspondence between the generated MR image and the input CT image. On the other hand, the representation of image semantic information by deep CNN, as well as the characteristics of the modal attributes of the generated and real MR image, is fully considered. Based on this, we construct an objective optimization function including the voxel error, perception error, and cosine error between the generated and real MR image. It is more effective to drive the model to generate MR images with clear detailed texture features and a high peak signal-to-noise ratio.

Conclusions
In this work, MR imaging is viewed as a machine perception problem. A multiconditional constraint GAN for MR imaging from CT scan data is proposed. Considering reversibility of GAN, both generator and inverse generator are designed for MR and CT imaging, respectively, which can constrain each other and improve consistency between features of CT and MR images. In addition, we innovatively treat the real and generated MR image discrimination as object re-identification; cosine error fusing with voxel loss and perception loss is designed to enhance the fidelity and detailed texture feature representation of the generated MR image. Quantitative and qualitative experiments conducted on the challenging public CT-MR imaging dataset show distinct performance improvement over other GANs utilized in medical imaging and demonstrate the effectiveness of our method for medical image modal transformation.